Leading consumption patterns of psychoactive substances in Colombia: A deep neural network-based clustering-oriented embedding approach

The number of health-related incidents caused using illegal and legal psychoactive substances (PAS) has dramatically increased over two decades worldwide. In Colombia, the use of illicit substances has increased up to 10.3%, while the consumption alcohol and tobacco has increased to 84% and 12%, respectively. It is well-known that identifying drug consumption patterns in the general population is essential in reducing overall drug consumption. However, existing approaches do not incorporate Machine Learning and/or Deep Data Mining methods in combination with spatial techniques. To enhance our understanding of mental health issues related to PAS and assist in the development of national policies, here we present a novel Deep Neural Network-based Clustering-oriented Embedding Algorithm that incorporates an autoencoder and spatial techniques. The primary goal of our model is to identify general and spatial patterns of drug consumption and abuse, while also extracting relevant features from the input data and identifying clusters during the learning process. As a test case, we used the largest publicly available database of legal and illegal PAS consumption comprising 49,600 Colombian households. We estimated and geographically represented the prevalence of consumption and/or abuse of both PAS and non-PAS, while achieving statistically significant goodness-of-fit values. Our results indicate that region, sex, housing type, socioeconomic status, age, and variables related to household finances contribute to explaining the patterns of consumption and/or abuse of PAS. Additionally, we identified three distinct patterns of PAS consumption and/or abuse. At the spatial level, these patterns indicate concentrations of drug consumption in specific regions of the country, which are closely related to specific geographic locations and the prevailing social and environmental contexts. These findings can provide valuable insights to facilitate decision-making and develop national policies targeting specific groups given their cultural, geographic, and social conditions.

Fraley and Raftery [54] suggest separating clustering approaches into hierarchical and partitioning techniques. Partitioning techniques are divided into density-, model-, and grid-based methods, the most popular of which are K-means, PAM, CLARA, DBSCAN and CLIQUE. On the other hand, hierarchical techniques are divided into agglomerative and divisive methods. Of these, the best-known methods are BRICH, CURE, ROCK, and CHAMELEON (see [55] for further reading). Although these techniques have been shown to perform well when relevant features are removed a priori, it is well-known that in clustering algorithms, irrelevant and redundant features in the data may degrade the quality of clusters and lead to high computational cost. Therefore, removing such features may alleviate these issues. Thus, we focus on identifying patterns of PAS consumption using an ensemble model integrating an autoencoder with both a clustering algorithm and a spatial model. As part of our approach, we used the most recent and representative works for data clustering, and different dimensionality reduction and feature selection methodologies proposed in the literature.
Feature selection approaches in clustering can be split into filter, wrapper, embedded, and hybrid approaches [37]. While wrappers depend on the clustering algorithms to evaluate the clustering quality of a selected feature subset, filters are independent of the clustering algorithm. Embedded approaches also work with a clustering algorithm and, unlike wrappers, incorporate knowledge about the clustering structure. Another type of method is hybrid approaches, which combine filter and wrapper approaches into a single strategy. However, studies on embedded and hybrid feature selection approaches in clustering are limited [37]. Other feature learning-based approaches using Deep Neural Networks have been shown to work well for linear and nonlinear models [56]. For instance, Xie et al. in (2016) [57] propose to work on feature extraction and clustering using pre-trained Auto-Encoders simultaneously. However, these are mainly used to work and process images. In general, deep clustering models use Auto-Encoders since they can learn input features without labels on the data; performance measures show that this approach is reliable for different data types [58]. Thus, deep clustering methods have become a growing field of research for feature selection [58]. In this regard, the use of convolutional networks in autoencoders and the application of feature selection for clustering are open questions that have not been fully addressed yet, especially when dealing with data from different statistical distributions [37].
Here, we propose a Deep Neural Network-based Clustering-oriented Embedding Algorithm that allows us to (i) identify consumption patterns of PAS; and (ii) build an ensemble algorithm integrating an autoencoder with a clustering algorithm and a spatial model to deal with the feature space and cluster memberships. Our approach is based on the model proposed by Xie et al. [57] and B. Li et al. [56], and expands their work by creating an autoencoder from a convolutional network to represent high-order interactions in the data accurately and simultaneously incorporate a spatial analysis to describe drug consumption patterns properly. Our main hypothesis is that incorporating these two critical elements in our proposal will help to identify and better understand drug consumption patterns and support national policy development processes.

Study area
Located in South America, the Republic of Colombia is a diverse country with a population of over 50 million people distributed over a territory of 440,831 square miles [59], encompassing jungles, highlands, grasslands, deserts, coasts, and islands, distributed in six regions and 32 departments (states) [60] (See S1 Fig). It is worth noting that, unfortunately, Colombia has been a major producer of illegal drugs for a long time, which has had a significant impact on drug consumption and abuse.
According to the United Nations Office on Drugs and Crime, Colombia is the first cocaineproducing country and the eighth country with the highest production of cannabis [61]. In addition, the Colombian Drug Observatory indicates that the use of illicit substances in the territory has increased to 10.3%, with men between the ages of 18 and 24 being the heaviest consumers of these types of drugs. Reports also indicate that consumption of licit substances such as alcohol and tobacco has recently increased dramatically [62].

Data sources
We used two databases to identify drug consumption patterns in Colombia. The first database was retrieved from the 2019 National Survey of Psychoactive Substance Consumption in the General Population (DANE-DIMPE-ENCSPA-2019; URL: https://microdatos.dane.gov.co/ index.php/catalog/680/data-dictionary) conducted by the National Statistical System (DANE) of Colombia [63]. This survey includes observations of 49,600 households, where information on housing, location, general characteristics of individuals, consumption of legal and illegal PAS, and implemented treatments is registered. The second database comes from the Colombian Drug Observatory and contains information on the production of PAS per area during 2019. All these databases are fully available and completely anonymized. In this study, we used departments (states) as georeferenced areas using polygons (i.e., a shapefile) as implemented in ArcGIS Hub [64]. Thus, an ethics statement approved by an ethics committee is not required since we are using public information without the identification or individual information of the people involved.  [57]. However, unlike the Xie et al. model, our structure is developed by applying convolutional layers for the deep autoencoder (DA) architecture instead of a linear one to represent high-order interactions in the data. In addition, a spectral clustering-based centroid estimation is proposed to achieve an improved initial centroid calculation. We chose the CAE-DEC framework based on its ability to reduce both the number of model parameters and the dimensionality, while creating clusters simultaneously.

Convolutional Auto-Encoder-Deep Embedded Clustering algorithm
In our approach, an encoder structure is first applied to map the input vector into a lower feature space, called latent feature space (LFS). Then, the LFS is independently passed through a decoder structure and a clustering layer to achieve an efficient clustering framework. The encoder-decoder combination (DA) attempts to extract a LFS preserving the relevant information from the original input data. On the other hand, the clustering layer seeks to execute an improved clustering assignment by minimizing the divergence between a target distribution and a centroid-based probability distribution.
In the last stage of the framework, a spatial analysis was performed using the feature space generated from the autoencoder as input. Here, the spatial data exploration is initially performed using Global Spatial Autocorrelation to determine to which level the similarity between observations in a dataset relates to the similarity of the locations of such observations [65]. To assess GSA, the Moran's I [66], Geary's C [67], and Getis and Ord's G [68] statistics are estimated. We also measure the Local Spatial Autocorrelation, which focuses on the relationships between each observation and its surroundings, rather than providing a single-number summary of these relationships across the map [69]. This is estimated based on the ability to determine whether spatial autocorrelation is present in a geographically referenced data set. Finally, we perform regionalization, which corresponds to a special kind of clustering where the objective is to group similar observations based on their statistical attributes and spatial location [70]. In this sense, regionalization embeds the same logic as standard clustering techniques while applying a series of geographical constraints [71]. This framework was built using the TensorFlow (https://www.tensorflow.org/) and PyTorch (https://pytorch.org/) libraries in Python version 3.11 [72].

Convolutional Auto-Encoder (CAE)
The DA is a deep neural network architecture capable of learning unsupervised representations of an input data set. Typically, DA networks are used for dimensionality reduction or denoising tasks. The structure of a DA is based on two deep networks: a network to transform the original input data into a latent feature space, and a network trained to reconstruct the original input data using the extracted latent space as input. The first network, used to extract the latent space, is called the encoder, while the second is called the decoder. Rather than using fully connected layers, the implemented DA architecture incorporates convolutional (CONV) layers and fully connected (FC) layers for LFS extraction and reconstruction (Fig 1). Integrating convolutional layers in a DA is also called CAE [73]. Compared to a DA, which is built with only fully connected layers, the CAE structure can reduce the number of parameters compared to a DA [74].
2.4.1 Convolutional layer. The proposed CAE structure is designed using four convolutional layers, two CONV layers during the encoder stage, two CONV layers during the decoder stage, and two fully connected layers (Fig 1). The convolution operation can be denoted as: where z l i;j;k is the value of each feature at the (i, j) location in the k-th feature map of the l-th layer, W l k and b l k represent the weight and bias of the k-th filter of the l-th layer, and X l i;j denotes the input value at location (i, j) of the l-th layer. For non-linear mapping, an activation function g(.) is applied over the convolutional feature z l i;j;k as follows: where a l i;j;k is the activation value resulting from applying the activation function g(.). The Rectified Linear Unit (ReLU) function is set as the activation function on each CONV, except in the final decoder CONV layer where a sigmoidal activation is applied.

Clustering layer
The clustering layer is inspired by Xie et al. [57]. Initially, a soft assignment is computed between the latent space, also known as embedded space, and the cluster centroids. Then, update steps are repeated to define the final cluster centroids and embedded space. The Kullback-Leibler (KL) divergence is used as loss function during the optimization procedure. The objective is to minimize de KL divergence between a soft clustering distribution Q and an auxiliar target distribution P. The KL loss is calculated as: where L c is the clustering loss. To measure the similarity between embedded point z k and the cluster centroid c m , the t Student's distribution is used as a kernel: with α the degrees of freedom of the t Student's distribution and q km is a soft clustering assignment distribution of each embedded point (i.e., probability of assigning point k to cluster m). As in Xie et al., when setting α = 1 the similarity function q km can be calculated as: To compute the target distribution p km , the second power of q km is calculated, and a cluster normalization is applied as follows: Then, by minimizing the divergence between P and Q, the embedding learning is achieved through highly confident assignments.
2.5.1 Center initialization. As previously mentioned, the cluster centroids are initialized using a spectral clustering-based approach. The spectral clustering allows flexible distance metrics and provides better cluster estimations than K-means [57]. However, most spectral clustering algorithms have high computational requirements. To overcome these computational requirements, random samples are taken to estimate the cluster centroids. As spectral clustering does not estimate any centroid during the learning process, once the clusters are defined, the mean of each cluster is used as the centroid estimator.

The CAE-DEC model
Initially, the input data is normalized within the interval [0, 1]. This normalization allows the network to use the most advanced learning rate and avoid the vanishing gradient problems, as well as alleviate overfitting. Further, to achieve a better learning process, the last CONV layer in the decoder structure is activated by a sigmoid activation function. Then, two training steps will be executed during the CAE-DEC learning process. Firstly, a CAE model will be trained to minimize the reconstruction loss L r computed as where x is the normalized input andx is the reconstructed output. This pretrained CAE model is then used as the DA structure in the CAE-DEC model.
In the second step, the CAE-DEC model is trained to simultaneously minimize reconstruction loss and clustering loss. The total loss during this training step will be set as where L r is the CAE-DEC reconstruction loss, L c is the CAE-DEC clustering loss, and C is a coefficient to control the loss balance. The training process is shown in Table 1. The goal is to obtain a latent space that minimizes the total loss. Finally, the label of each embedded point is established as where q jm is the probability that point j belongs to a specific cluster center m. On the other hand, the maximum number of iterations Mint and the target distribution P update condition P_change was chosen based on multiple experiments. The final Mint and P_change values were 3000 and 5, respectively. This final P_change improves stability during the training process.

Framework evaluation
We trained the CAE-DEC method using data retrieved from the National Survey of Psychoactive Substance Consumption (DANE-DIMPE-ENCSPA-2019), which contains 49,600 observations. A second database with PAS production figures, was used in the spatial analysis stage to correlate the PSA consumption and production. In order to evaluate the framework, we compared our CAE-DEC approach with other approaches, including CAE, and Principal Component Analysis integrated with clustering (PCA-DEC). For evaluation and comparison purposes, we use the Calinski-Harabasz [75], Davies-Bouldin [76], and Silhouette [77] index as intrinsic clustering metrics. In addition, we used the χ 2 statistic to investigate potential associations and differences among the patterns (clusters) identified using our approach. On the other hand, the reconstruction loss obtained through the CAE model is higher than that of the CAE-DEC model. This result may be related to the fact that the CAE-DEC model used the pre-trained CAE model during its construction. It should be noted that the CAE model alone cannot determine the labels of each point or define clusters in the data. Thus, clusters in Fig 2 were obtained through spectral clustering and were the bases for initializing the centroids in the CAE-DEC model.

Identification of clusters of psychoactive drugs consumption
Here we analyse the patterns in each cluster obtained using the CAE-DEC model. We defined a priority dummy variable Y ij quantifying whether the ith person in household jth has consumed PAS; Y ij = 1 when an individual has never consumed PAS and Y ij = 2 otherwise. Out of the 49468 individuals in the sample, only 5514 (11.15%) consume PAS. Fig 3a and 3b depict,  respectively, the derived cluster structure for individuals consuming PAS and those who reported not consuming, derived from the CAE-DEC model. Our results indicate that individuals in clusters 0 and 2 are more likely to consume some PAS (Fig 3a), while most individuals in cluster 1 do not (Table 2). In particular, 1726 (11.56%) individuals in cluster 0, 392 (3.4%) individuals in cluster 1, and 3396 (14.76%) individuals in cluster 2 have used PAS (Table 2). A χ 2 -based test of independence reveals that the region where individuals are located, age (years), the type of household they live in, their socioeconomic status (SES), and whether they contribute to the household finances are statistically significantly associated with the cluster they belong to ( Table 2). Table 3 shows the adjusted residuals for our model. According to our results, the Central-Eastern region significantly contributes to the Region variable. In this region, the observed value is higher than the expected value in cluster 2, while the observed value is lower than the expected value for cluster 0. Although to a lesser extent, the Llanos Orientales region also significantly contributes the χ 2 statistic. Indeed, this region shows fewer observed individuals than the expected number of individuals in cluster 2 and a higher number observed than expected individuals in clusters 0 and 1 (Table 3).
On the other hand, Gender has a higher-than-expected value of males in clusters 0 and 2, while it is lower in cluster 1. For females, the opposite occurs in cluster 1, and lower values are observed in clusters 0 and 2. Similarly, Housing Type has a higher-than-expected value of individuals living at houses in cluster 1 and a lower-than-expected in cluster 2. Conversely, cluster 2 has more individuals living in apartments, and cluster 1 has the lowest ( Table 3). Regarding SES, a higher-than-expected number of individuals in strata 3, 4, 5, and 6 in cluster 0 were found (Table 3). We also observed a lower-than-expected number of individuals in strata 3, 4, 5, and 6 in cluster 2 and a higher-than-expected number in strata 1, 4 and 6 in cluster 1 (Table 3). Moreover, the age variable shows a higher-than-expected observed value for the (0,20] range in cluster 0. For ages between (20,40] years, cluster 2 has a higher-thanexpected number of individuals. Conversely, there is a lower number of individuals in cluster 1. Finally, the household economy variable results show that cluster 2 has a higher-thanexpected value of individuals contributing to the household finances, and cluster 1 has a lower-than-expected value of individuals not contributing to it. Comparison of Calinski-Harabasz, Davies-Bouldin, and silhouette metrics between a principal component analysis (PCA)based deep autoencoder (PCA-DEC) and our proposed CAE-DEC model indicates the superiority of the latter (S2 Table).

Spatial analysis of psychoactive drugs consumption
Different alternative classification algorithms were used to determine the number of choropleth class limits (i.e., Equal Intervals, Quantiles, Maximum Breaks, Box plot, Head-Tail Breaks, Jenks-Caspall, Fisher-Jenks, and Max-p) and compared using the absolute deviation around class medians optimization criterion (Fig 4). According to our results, the Fisher-Jenks classifier performed better and hence was selected.
Following the same exploratory spatial analysis, we constructed a choropleth with the percentage of PAS use for each of the 32 Colombian departments (Fig 5a). We found that the departments of Arauca, Vichada, Caquetá, Chocó, Magdalena, Cesar, Bolivar, Sucre, Cordoba, and Norte de Santander have low percentages of drug use. However, some of these departments are major drug producers (i.e., Cordoba and Guaviare), according to data from the Drug Observatory of Colombia [78]. Similarly, Putumayo is the department with the highest proportions of PAS use (Fig 5a). The global Moran's I results show the presence of a statistically significant positive global spatial autocorrelation (I = 0.2005, P<0.01). Thus, the null hypothesis that the map is random (i.e., that the map shows more spatial patterns than we would expect if the values had been randomly assigned to a location) is rejected. In addition, other global indices such as Geary's C (C = 0.693, P = 0.003) and Getis and Ord's G (G = 0.800, P = 0.049) confirm the presence of statistically significant global spatial autocorrelation.
To further explore the relationships between each observation and its environment, the Local Indicators of Spatial Association (LISA) were estimated (more information on LISA statistics is provided in S3 Fig). Fig 5b depicts the Moran diagram, indicating each quadrant's positive (or negative) association. Specifically, the high-high (HH) and low-low (LL) quadrants indicate a positive association between high and low drug use. On the other hand, the lowhigh (LH) and high-low (HL) quadrants indicate negative associations with drug use (Fig 5b). Following our results, we found that departments such as Nariño and Cauca belong to the HH cluster. In contrast, la Guajira, Atlántico, Magdalena, Cesar, Norte de Santander, Sucre, and Cordoba belong to the LL. This clustering pattern leads to a statistically significant Moran's I statistic (P-value <0.01). Thus, a little over 39.4% of the departments are considered, by this analysis, to be part of a spatial cluster (i.e., statistically significant with a P-value <5%). We also identified that, among legal drugs, alcohol and tobacco are the most frequently consumed in the national territory (Fig 6a). At the same time, marijuana, followed by non-prescription tranquilizers and Yagé, and a slight consumption of opioids and Poppers, are the most frequently consumed illegal drugs (Fig 6b).
Regarding legal drugs, alcohol has the highest consumption rates in Bogotá, Cundinamarca, and Chocó (Fig 7). However, there is moderately high use in Vaupés, Nariño, Bolívar, Magdalena, La Guajira, and Atlántico. As for energy drinks, consumption is the highest in Casanare and Guaviare and has slightly high uses in Boyacá, Nariño, Risaralda, and Arauca. On the other hand, tobacco has the highest consumption in Cundinamarca but has moderately high uses in Bogotá, Boyacá, Nariño, Casanare, Tolima, Quindío, Risaralda, Guainía, Caldas, and Vaupés. It should be mentioned that the use of these drugs is also present across the country but with a lower incidence (Fig 7).
Concerning illegal drugs, non-prescription tranquilizers and stimulants are most prevalent in Casanare (Fig 8). However, the consumption of tranquilizers is slightly higher in Nariño, while inhalants have the highest consumption in Quindío, followed by Cauca, Caldas, and Nariño. Methylene Chloride has the highest consumption in Cauca and a high consumption in Quindío and Nariño; Antioquia, followed by Caldas and Risaralda, shows the highest consumption of popper. On the contrary, marijuana has its highest consumption in Risaralda and moderately high consumption in Caldas, Bogotá, Antioquia, and Quindío. As for cocaine, its consumption is the highest in Risaralda and moderately high in Antioquia (Fig 8).
On the other hand, basuco (i.e., cocaine paste) has the highest consumption rate in Guaviare, and critical consumption in Nariño, Cauca, Quindío, Antioquia, and Amazonas; ecstasy has its highest consumption in Risaralda, followed by Bogotá and Caldas; heroin consumption is highest in Vaupes, Huila, Cauca, Quindío, and Arauca, and is slightly higher in Casanare; methamphetamine consumption is highest in Casanare and is moderately high in Boyacá; methadone is most widely used in Quindío, but has slightly high levels of use in Valle del Cauca and Caquetá; opioids are most prevalent in Casanare, followed by Sucre; LSD is most prevalent in Bogotá, but has high levels of use in Caldas, Risaralda, Quindío, and Nariño; mushrooms have their highest consumption in Boyacá and have moderately high uses in Quindío, Risaralda, Bogotá, Cauca, and Casanare; Yagé has a higher incidence in Putumayo; cacao sabanero has its highest consumption in Caldas, and has moderate consumption in Cundinamarca, Bogotá, Antioquia, and Quindío; ketamine has the highest consumption in  Casanare, followed by Antioquia; and GHB has the highest consumption in Risaralda, followed by Santander, Valle del Cauca, and Norte de Santander. Finally, 2CB has the highest consumption rate in Risaralda, followed by Caldas. Although the consumption pattern of some departments is not mentioned, there is low and moderate consumption for certain drugs in some of them (Fig 8).

Regionalization of clusters
We applied a regionalization method as a grouping technique for imposing a spatial restriction, i.e., the result of a regionalization algorithm contains clusters with geographically coherent areas and coherent data profiles. Our approach uses a spatially constrained hierarchical clustering algorithm, which identified three clusters representing the consumption of PAS in the country (Fig 9). The number of clusters was estimated based on the average silhouette indexes, the total intra-cluster variance, and dendrograms (S2 Fig). Following our results, cluster 0 is comprised of departments such as La Guajira, Cesar, Atlántico, Magdalena, Norte de Santander, Bolivar, Sucre, and Cordoba, all of them located in the Northern region of the country; cluster 1 is comprised of Antioquia, Santander, Boyacá, Caldas, Risaralda, and Quindío; and cluster 2 is integrated by the remaining departments (Fig 9). When testing geographical coherence, which is the measure that assesses the "compactness" of a given shape, our results indicate that the clusters derived using the regionalization model represent moderately compact regions. In addition, the feature coherence (i.e., goodness-of-fit) test using different metrics showed that our 3-cluster regionalization structure properly fits the data (S1 Table).

Discussion
In this study, we propose and test a Deep Neural Network-based Clustering-oriented Embedding algorithm (i.e., a ML-based model) for identifying psychoactive substance (PAS) use and abuse patterns in Colombia. This model allows the automatic extraction of features from the input data (such as sex, age, socioeconomic status, and housing type) to determine whether an individual has consumed PAS. It then creates clusters in the new data space generated during the learning process, following the methods outlined in [56,57]. After the training process, a latent feature space (LFS) is generated, and the results are subsequently analysed.
We have identified clearly marked clusters where the prevalence of individuals who use or do not use PAS is notable. Additionally, we found that region, sex, housing type, socioeconomic strata, age, and whether individuals contribute to household finances have a statistically significant impact on the clustering structure. These findings are consistent with previous studies aimed at identifying PAS consumption patterns [19,79,80]. Interestingly, when comparing the CAE-DEC model proposed in this study and the CAE-Spectral model using different metrics (i.e., Silhouette statistic, which measures the internal density of each cluster and the distance that separates them from each other, the Calinski-Harabasz index and the Davies-  Table).
Based on our findings, individuals more likely to consume PAS are grouped in cluster 2, while cluster 1 consisted of individuals who did not consume PAS (Table 2). Not surprisingly, a significant proportion of females characterizes cluster 1. In addition, most individuals belong to socioeconomic strata 1, are 40 years old or older, and do not contribute economically to support their household. In contrast, cluster 2 is characterized by a higher proportion of males aged between 20 and 40 in socioeconomical strata 1 and 2, who do not contribute to the household finances (Table 2). Finally, cluster 0 is characterized by a small proportion of males, a higher proportion of individuals in strata 3, 4, 5, and 6, and individuals are more likely to contribute to the household economy (Table 2).
At the level of spatial statistics, we identified that legal drugs such as alcohol have a high prevalence in all regions of Colombia, with a slight tendency to more consumption in coastal areas (Fig 7). In our country, the coastal areas are often popular tourist destinations, and many tourists come to these areas looking for a relaxing experience, which can increase alcohol consumption. Coastal areas typically have warmer temperatures and more sunshine, increasing thirst and making people more likely to consume beverage. Additionally, bars, clubs, and restaurants serve alcoholic beverage due to the high demand from tourists and locals [81,82]. Another characteristic of this area is the fishing and maritime culture. This culture is often associated with hard work and long working hours, and alcohol may be seen as a way to relax and unwind after a tough day at the sea [83]. Finally, this region has 69% urban and 31% rural zones [59]. The level of development, as measured by gross domestic product (GDP), is the third region with significant economic development in the country [84] (S3 Table). Interestingly, the consumption of illegal drugs is lower in the Northern region than in other regions of the country. However, there is a more representative consumption of non-prescription tranquilizers, opioids, ketamine, GHB, and heroin. In particular, the Atlántico department has the highest consumption proportion within this region (Fig 8).
Tobacco consumption is present in all regions, with a higher proportion in the Central region (Eje Cafetero-Antioquia), where climate conditions resemble temperate weather. Also, this region has a diverse consumption pattern, where drugs such as marijuana, popper, cocaine, ecstasy, inhalants, methadone, heroin, LSD, GHB, 2CB, and mushrooms prevail. This region has Colombia's largest cities (i.e., Bogotá and Medellin); Bogotá has the highest population density and is a hub for drug trafficking routes, while Medellín has an unfortunate history of drug cartels and gang violence. Ultimately, this region is comprised of 79% urban areas, and the most developed cities in the country are located there [59,84] (S3 Table).
Energy drinks are more frequently used in the Central-Eastern region, characterized by a continental climate surrounded by flat territory. Our results are in line with the scientific literature suggesting that the location of regions within countries is directly associated with the consumption of PAS [26,[85][86][87]. The consumption of heroin, basuco, non-prescription tranquilizers, stimulants, methamphetamines, opioids, and ketamine characterizes this region. This zone is the second most developed region in the country, and 71% of urban areas [59,84], (S3 Table).
Our findings also show that the Southern region is more likely to consume illegal drugs, including basuco, heroin, and Yagé (Fig 8). One of the main reasons for this result is that, unfortunately, this region has favourable environmental characteristics (i.e., majority rainforest) for their consumption and production, being the second largest illegal drug-producing region in Colombia [78]. Furthermore, this region has the highest percentage of rurality (55%) compared to the other regions, and its level of development is low as measured by the GDP [59,84] (S3 Table).
In the Western region (Pacific), also known as the Pacific region, consumption mostly mainly includes of Methylene Chloride, GHB, heroin, opioids, and methamphetamines. This region (Pacific) is mainly characterized known for its geographical isolation, poverty, and ongoing conflict, which have contributed to the growth of drug production and trafficking in the area. Poverty is one of the main factors driving drug production in the Pacific region, which has led many people to turn to drug cultivation and trafficking for survival. Additionally, the region's rugged terrain and limited infrastructure have made it difficult for the Colombian government to establish a strong presence, allowing drug traffickers to operate with relative impunity [88]. This region has a similar percentage of urban (53%) and rural (47%) populations than the Southern region and ranks second among the regions with the lowest levels of development (S3 Table).
In the Western region, also known as the Pacific region, consumption mainly includes methylene chloride, GHB, heroin, opioids, and methamphetamines. This region is mainly characterized for its geographical isolation, poverty, and ongoing conflict, which have contributed to the growth of drug production and trafficking in the area. Poverty is one of the main factors driving drug production in the Pacific region, which has led many people to turn to drug cultivation and trafficking for survival. Additionally, the region's rugged terrain and limited infrastructure have made it difficult for the Colombian government to establish a strong presence, allowing drug traffickers to operate with relative impunity [88]. This region has a similar percentage of urban (53%) and rural (47%) populations than the Southern region, and ranks second among the regions with the lowest levels of development (S3 Table) [88]. This region has a similar percentage of urban (53%) and rural (47%) populations than the Southern region. On the other hand, this region ranks second among the regions with the lowest levels of development (S3 Table).

Conclusion
In summary, the proposed CAE-DEC model simultaneously integrates a feature extraction process within the clustering design, prioritizing features that improve the separation between groups, thus avoiding the manual extraction of features, which is a frequent process in traditional models. Additionally, a geospatial component is sequentially included to expand the resulting insights by considering geographic constraints. Currently, these types of architectures are scarce in understanding mental health problems. As part of future work, the architecture of the proposed model could be improved to integrate the automatic extraction of features while optimizing a geospatial loss. Following our experience with the proposed CAE-DEC in PAS consumption, the application of this model to other mental health problems, such as suicide, depression, and domestic violence, among other pathologies, could be explored. Based on these results, effective interventions and/or government policies to prevent and/or mitigate their impact could be promoted and evaluated, for example, by developing regional interventions based on the types of drugs most prevalent in the area and the cultural and socio-economic characteristics. This can include education, treatment, and harm reduction programs. Also, this information can be used to develop public health campaigns to raise awareness about the risks of drug use and reduce their negative impact. Furthermore, this information can be used to crack down on drug trafficking and distribution networks. On the other hand, this information can be used to alert healthcare providers and regulatory bodies to take appropriate action to prevent their use and discover new drugs.
Supporting information S1